An alternative pattern for objects in Python
Python is mostly a very nice language that makes for smooth and readable
code. An exception is the common way of coding classes. First, it clutters methods
with self
references. Second, if you choose to use the double underscore
convention for indicating a “private” attribute, each reference to it is
preceded by self.__
. This can destroy much of the readability. If
you doubt me, take a look at ./classic/VertexPQ.py for a particularly self.__
infested example.
Here's a prototype example of a Python class (which doesn't look too bad, because it's small and only has one attribute):
class T: def __init__(self, x): self.__x = x def print_x(self): print(self.__x) def times_x(self, multiplier): return multiplier * self.__x def change_x(self, new_x): self.__x = new_x
Some people seem to think referencing through self
is a good thing, and do the
same in, e.g., Java, when they don't even have to, writing this.x
rather than
plain x
to reference an attribute. In my view, that's clumsy, and stops the
flow of reading. When you want x you should just have to think x, not “the
attribute called x in the object for which the current code was invoked”. If
it's important to remember that x is an object attribute, you should instead
give it a name that makes that obvious in a natural way.
The double underscore naming convention is a way of getting around that Python doesn't have a simple way of limiting access to attributes. Referencing double-underscore names at least makes access to them look conspicuous (even more so since Python actually name-mangles even more garbage onto them, which you have to mimic when referencing them from outside). Fine, but the problem is references in class methods also look conspicuous (and get difficult to read), which of course they aren't.
I'm not alone in not being crazy about double underscores. But just getting rid of them is no solution, because you want some way of keeping outside code from leisurely referencing “private” attributes. (Sure, you can use one underscore instead of two – which actually appears to be what the Python tutorial suggests, except when you want to protect agains name clashes that might result from inheritance – but that just halves the problem at best, it doesn't remove it.) Students are often taught that this naming is the proper way to do privates in Python.
The solution to both of our problems
Another language that lacks a language construct to hide private attributes from the outside is Javascript. The way I've learned to get around that, from Javascript: The Good Parts by Douglas Crockford, is to skip classes (which were badly designed in Javascript anyway) and use the closure of the constructor function as storage for the object.
In Python, I suggest keeping classes for objects, but still use the closure of the constructor pattern. Here's a first attempt, which uses the pattern completely manually, at a replacement for the example above.
class T: def __init__(self, x): def print_x(): print(x) def times_x(multiplier): return multiplier * x def change_x(new_x): nonlocal x x = new_x self.print_x = print_x self.times_x = times_x self.change_x = change_x
Most of the clutter is gone, and as a bonus, x
is now truly private: there is
no way it can be accessed by name from the outside, except through the methods
intended for it. (You can still hack into it if you know it's position in the
__closure__
of the methods.)
However, the three assignments at the end that “export” the methods are annoying. We get rid of them by the use of a Python decorator.
from clo_decorators import clo_method class T: def __init__(self, x): @clo_method(self) def print_x(): print(x) @clo_method(self) def times_x(multiplier): return multiplier * x @clo_method(self) def change_x(new_x): nonlocal x x = new_x
You find clo_method
in my ./closure/clo_decorators.py. That also contains a
clo_property
that can be used to let users of the class reference properties
as if they were regular object attributes, but access to them go through
specified accessor methods. It's likely that I didn't code this in the best
possible way, but it works.
When T
is used the way it's supposed to, object closure implementation variant
behaves exactly the classic self.__
implementation. There may be performance
differences, however, which we get to below.
A larger example and the performance aspect
I teach a basic algorithms class based on the book Algorithms, 4th Edition by
Sedgewick and Wayne, which has most of its examples in Java. Sometimes I get
students who are not used to writing in Java, and I allow them to use
other languages for their assignments. To help the students that code in Python
with a graph algorithm assignment, I ported a few of the graph classes from the
book to Python. To demonstrate the closure object pattern, I've done this both
with the classic Python object pattern and with the closure object pattern using
my clo_decorators
. You can find the code here:
- Classic implementation
- Closure implementation
- Demo/test examples that work with either of the implementations
In particular, have a glance at the closure binary heap implementation ./closure/VertexPQ.py compared to the classic implementation ./classic/VertexPQ.py to get an idea of how much cleaner a relatively complex data structure gets with the closure pattern.
The example programs ./demo/test_bfs_random.py and ./demo/test_dijkstra_random.py each take two integer parameters to generate random graphs that they run the algorithms on: the first is the number of vertices, the second is roughly the out-degree of each vertex. To stress test the performance of the implementations, you can try them with large graphs. Here are some sample runs of the Dijkstra demo (the one with the heaviest work) on my laptop:
demo$ export PYTHONPATH=../classic demo$ time py test_dijkstra_random.py 100000 5 real 0m4.503s user 0m4.419s sys 0m0.059s demo$ export PYTHONPATH=../closure demo$ time py test_dijkstra_random.py 100000 5 real 0m3.525s user 0m3.452s sys 0m0.058s
The results are consistent: on a graph with a hundred thousand vertices, and about five times as many edges, time went down from about 4.5 to 3.5 seconds when switching to the closure pattern. I expect that this is because access to local variables inside the methods is faster than the corresponding access to object attributes. (I haven't verified this, but it makes sense.) There's no guarantee that the closure pattern always makes the code faster, there might be other examples where it has the opposite effect, but at least this demonstrates that there's the potential of a performance gain.
Of course, you don't have to choose one or the other for everything. I myself intend to use the classic object pattern in parallel with the closure pattern wherever I find it the more suitable.
Thanks
Comments by Jonas Nockert and Peter Lindberg prompted me to edit some details in the text.